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Abstract 

In this paper we study abstract shapes of A;-noncrossing, a-canonical RNA 
pseudoknot structures. We consider Iv^- and Iv^-shapes, which represent a 
generahzation of the abstract tt'- and vr-shapes of RNA secondary structures 
introduced by Giegerich et al. Using a novel approach we compute the 
generating functions of Iv^- and lv|-shapes as well as the generating functions 
of all Iv^- and Iv^-shapes induced by all /c-noncrossing, a-canonical RNA 
structures for fixed n. By means of singularity analysis of the generating 
functions, we derive explicit asymptotic expressions. 
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1. Introduction 



Pseudoknots have long been known as important structural elements [33 



see Fig. [H They represent cross-serial interactions between RNA nucleotides 



and are an important functionally in tRNAs, RNaseP ISj, telomerase RNA 



25l |. and ribosomal RNAs [IJ]. Pseudoknots in plant virus RNAs mimic 



tRNA structures, and in vitro selection experiments have produced pseu 



doknotted RNA families that bind to the HIV-1 reverse transcriptase [27 
Import general mechanism, such as ribosomal frame shifting, are dependent 
upon pseudoknots [if. 
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Figure 1: The pseudoknot structure of the PrP-encoding rriRNA. 



Despite their biological importance, pseudoknots are typically excluded 
from large-scale computational studies. Although the problem has attracted 



considerable attention in the last decade, and several software tools [8|, [23 
have become available, the required resources have remained prohibitive for 
applications beyond individual molecules. 

An RNA molecule is a sequence of the four nucleotides A, G, U and C 
together with the Watson-Crick (A-U, G-C) and U-G base pairing rules. 
The sequence of bases is called the primary structure of the RNA molecule. 
Two bases in the primary structure which are not adjacent may form hy- 
drogen bonds following the Watson-Crick base pairing rules. Three decades 



ago Waterman et al. [13|, l22|, l311 analyzed RNA secondary structures. Sec- 
ondary structures are coarse grained RNA contact structures. They can be 
represented as diagrams, planar graphs as well as Motzkin-paths, see Fig. O 
Diagrams are labeled graphs over the vertex set [n] = {1, . . . ,n} with ver- 
tex degrees < 1, represented by drawing its vertices on a horizontal line 
and its arcs [i,]) {i < j), in the upper half-plane, see Fig. [2] and Fig. [31 
Here, vertices and arcs correspond to the nucleotides A, G, U and C and 
Watson-Crick (A-U, G-C) and (U-G) base pairs, respectively. In a dia- 
gram two arcs (ii, ji) and («2, j2) are called crossing if ii < 12 < ji < 32 



2 





1 7 10 13 22 25 27 31 39 43 48 52 60 7172 

Figure 2: The Sprinzl tRNA RD7550 secondary structure represented as a planar 
graph (top), 2-noncrossing diagram (middle) and Motzkin-path (bottom), where 
up/down/horizontal-steps correspond to start /end/unpaired vertices, respectively. 

holds. Accordingly, a fc-crossing is a sequence of arcs (ii, ji), . . . , {ik,jk) such 
that ii < i2 < ■ ■ ■ < ik < ji < j2 < ■ ■ ■ < jk, see Fig. [31 We call diagrams 
containing at most {k — l)-crossings, /c-noncrossing diagrams (fc-noncrossing 
partial matchings). 

An important observation in this context is that RNA secondary struc- 
tures have no crossings in their diagram representation, see Fig. [3] (l.h.s.) 
and Fig. [21 and are therefore 2-noncrossing diagrams. The length of an arc 
is given by j — i, characterizing the minimal length of a hairpin loop. 
A stack of length cr is a sequence of "parallel" arcs of the form 

((z, j), (z + 1, J - 1), . . . , (z + (a - 1), J - (a - 1))). (1) 

In the context of minimum-free energy pseudoknot structures jsf a minimum 
stack length a or either two or three is stipulated. We remark that RNA sec- 
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ondary structures are 2-noncrossing, 2-canonical diagrams, whose numbers 
are asymptotically given by 



S2,2{n) ~ 196798"^ ^ > 0. 



(2) 



We call an arc of length one a 1-arc. A fc-noncrossing, a-canonical RNA struc- 
ture is a fc-noncrossing diagram without 1-arcs, having a minimum stack-size 
of a. 



CK> 1 
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Figure 3: A 2-noncrossing, 2-canonical RNA structure (left) and a 3-noncrossing, 
2-canonical RNA structure (right). 



The efficient minimum free energy (mfe) folding of secondary structures 
is a consequence of the following relation of the numbers of RNA secondary 
structures over n nucleotides, S2{n), [sH 



n-2 



S2{n) = S2{n-l) + J2S2{n 

j=0 



2-j)S2{j), 



(3) 



where 5*2 (n) = 1 for < n < 2. Accordingly, RNA secondary structures 
satisfy a constructive recursion. As mentioned above, this relation is the 
key for deriving the fundamental DP-recursions used for the polynomial time 
folding of secondary structures 0, and has therefore profound algorithmic 
implications. In addition, eq. is of central importance for the analysis of 
abstract shapes 2l(|. In addition, for a given RNA sequence, we have not 
only one but an ensemble of structures, quantified via the partition function 
generated by the (Boltzman weighted) probability space of all structures 



19|. In view of the fact that the number of the mfe and suboptimal foldings 
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of an RNA sequence is large, Giegerich et al. |4j introduced the notion of 
abstract shapes of secondary structures. Two particularly important shape 
levels are the important level-1 (tt'-) and level-5 (tt-) shapes were studied in 



In 28 



the authors compute the probability of a shape by means of the 
partition function, where the probability of a shape is the induced probability 
of all the structures inducing it. 

The problem with pseudoknotted structures is, that they do not satisfy 
a recursion of the type of eq. ([3]), rendering the ah initio folding into mfe 
configurations j^, 17] as well as the derivation of any other properties a 
nontrivial task. Here, we generalize the vr'- and vr-shapes of [3], by introducing 
Iv^- and Iv^-shapes, see Fig. HI Our results are not new in case of = 2, 






4^ 



Figure 4: Iv^- and lv|-shapes: a 3-noncrossing, 2-canonical RNA structure (top), 
its Ivg-shape (bottom left) and its lv|-shape (bottom right). 



since we have Ivi = vr' 



and Ivn 



vr. In two beautiful papers jl6l . |21 



16 



TT- 



2l[ explicitly make 
3]). Their approach 



and TT-shapes have been analyzed. The results of 
use of the constructive recurrence relation given in eq. 
can consequently not be generalized to RNA pseudoknot structures, as the 
latter are genuinely nonrecursive. Our framework therefore identifies the 



combinatorial "heart" of the results of [ly, |2l| and provides a new approach 
avoiding any notion of grammar or recursiveness. The key idea behind the 
construction of Iv^ and Iv^-shapes is a projection onto so called fc-noncrossing 
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core-structures 

The paper is organized as follows: after introducing all necessary back- 
ground we give a detailed computation of the generating functions and study 
their singularities. We derive simple asymptotic expressions for the numbers 
of Iv^- and Iv^-shapes as well as the numbers of theses shapes, induced by 
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/c-noncrossing, a-canonical RNA structures of fixed length n. Finally we put 
our results into context. 



2. Some basic facts 

Let fk {n, I) denote the number of fc-noncrossing diagrams on n vertices 
having exactly £ isolated vertices. A diagram without isolated points is called 
a matching. The exponential generating function of /c-noncrossing matchings 
satisfies the following identity 

2n 

E/^(2n,0)-^ = det[/._,(2z)-/,+,(2z)]|^-ii (4) 



n>0 

^2i + r 



where Ir{2z) = X]j>o j\{j+ry. hyperbolic Bessel function of the first kind 

of order r. Eq. (jlj) allows to conclude that the ordinary generating function 

n>0 



is D-finite [2J], i.e. there exists some e G N such that 

qo4z)-^Fk{z) + qi,k{z)-^^Fk{z) + ■■■ + q,,k{z)Fk{z) = 0, (5) 

where qj,k{z) are polynomials. Since Ir{2z) is D-finite by its definition and 
Z)-finite power series are algebraic closed ji^]. The key point is that any 
singularity of Fk{z) is contained in the set of roots of qo,k{z) 2J|, which we 
denote by Rk- For 2 < A; < 9, we give the polynomials qo,k{z) and their roots 
in Table [TJ 

In [l^ we showed that for arbitrary k 

M2n, 0) ~ Ck {2{k - l))^", > . (6) 

in accordance with the fact that Fk{z) has the unique dominant singularity 
pI, where pk = 1/ {2k - 2). 

Let Tk^aiji) denote the set of fc-noncrossing, cr-canonical RNA structures 
of length n and let Tfc o-(n) denote their number. Tk^ain) can be identified 
with the set of fc-noncrossing RNA structures with each stack size > cr. 
Furthermore, let Tk^cr{n, h) denote the set of fc-noncrossing, cx-canonical RNA 
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k qo,k{z) Rk 

2 i^z-l)z {J} 

3 {16z-l)z' {^} 

4 (144^2-40^ + 1)^3 {1,JL} 

5 (1024^2 _ +1)^4 

6 (1440023-4144^2 + 1402-1)^5 {i^ ^ i } 

7 (147456^3- 12544^2 + 224^-1)26 {i^^M'Tiii 

8 (2822400z^ - 82662423 + 31584^2 - 336z + 1)^7 {i^ 

9 (37748736z^ - 335872023 + 6988822 - 4802 + 1)2^ {.L^ -L^ _!_} 



Table 1: We present the polynomials qo^ki^) and their nonzero roots obtained by 
the MAPLE package GFUN. 



structures of length n with h arcs, and set Tk,a{n,h) = \Tk^„{n, h)\. The 
bivariate generating function of i(n, h) [k > 2) has been computed in l^ 

L-J 

E V = 2 ^ f ^ ^ ^ (7) 

and the generating function for /c-noncrossing, cx-canonical RNA structures 



is given by [11 



n>0 



According to Pringsheim's Theorem [i], 26], each power series f{z) 



where = (pf^+i 



X]n>o '^n with nonnegative coefficients and a radius of convergence i? > 
has a positive real dominant singularity aX z = R. This singularity plays 
a key role for the asymptotics of the coefficients. The class of theorems 
that deal with such deductions are called transfer-theorems jsf. One key 
ingredient in this framework is a specific domain in which the functions in 
question are analytic, which is "slightly" bigger than their respective radius 
of convergence. It is tailored for extracting the coefficients via Cauchy's 
integral formula. Details on the method can be found in [i], [i^. In case 
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of D-finite functions we have analytic continuation in any simply connected 



domain containing zero [29|| and all prerequisites of singularity analysis are 
met. We use the notation 

{f{z) = O {g{z)) as z ^ p} is bounded as 2; — * pi . (9) 



9[z) 

Let [z'^]f{z) denote the n-th coefficient of the power series f{z) at z = 0. 

Theorem 2.1. [S] Let f{z),g{z) be D -finite functions with unique dominant 
singularity p and suppose 

f{z) = Oigiz)) as z^p. (10) 

Then we have 

[z-]f{z) =c(l- Oi-)] [z-]g{z) (11) 



n 



where C is a constant. 



Theorem 12.11 implies the following result, tailored for our functional equa- 
tions. It is a particular instance of the supercritical paradigm, where we have 
the following situation: we are given a D-finite function, f{z) and an alge- 
braic function g{u) satisfying g{0) = 0. Furthermore we suppose that f{g{u)) 
has the unique real valued dominant singularity 7 and g is regular in a disc 
with radius slightly larger than 7. The supercritical paradigm then stipulates 
that the subexponential factors of f{g{u)) at m = coincide with those of 

Proposition 1. Suppose ^(^{z) is an algebraic function, analytic for \z\ < 5 
and satisfies 'da{^) = 0. Suppose further 'jk, a < S is the real unique dominant 
singularity ofFk{^^a{z)) and satisfies 'da{lk,a) = pi- Then 

[Z-] F,(C(z)) ~ c,n-«^-i)^+(^-i)/^) (7,,^)" . (12) 

Let Qk{n,m) denote the set of the /c-noncrossing matchings of length 2n 
with m 1-arcs. In our first lemma, we will compute the bivariate generating 
function of gk{n, m), i.e. the number of /c-noncrossing matchings of length 2n 
with exactly m 1-arcs. 



8 



Lemma 2.2. Suppose k,n,m ^ N, k > 2, < m < n. Then gk{n,m) 
satisfies the recursion 

(m + l)gk{n + 1, m + 1) = (m + l)gk{n, m + 1) + {2n + 1 — m)gk{n, m). (13) 

Furthermore, the generating function Gk{x,y) = '^n>oYlm=o9k{'n,m)x"-y™' 
is given by 

Proof. Choose a /c-noncrossing matching 5 G ^jt(n+l, m+1) and label one 
1-arc. We have (m + l)gk{n + 1, m + 1) different such labeled fc-noncrossing 
matchings. On the other hand, in order to obtain such a labeled matching, we 
can also insert one labeled 1-arc in a fc-noncrossing matching S' G Gk{n, m+1). 
In this case, we can only put it inside one original 1-arc in S' in order to 
preserve the number of the 1-arcs. We may also insert a labeled 1-arc in 
a fc-noncrossing matching 6" G Qk{n,m). In this case, we can only insert 
the 1-arc between two vertices not forming a 1-arc. Therefore, we arrive at 
(m + l)gk{n, m + 1) + {2n + 1 — m)gk{n, m) different such labeled matchings 
and 

(m + l)gk{n + 1, m + 1) = (m + l)gk{n, m + 1) + (2n + 1 - ■m)gk{n, m). (15) 

This recursion implies the following partial differential equation for the gen- 
erating function 

^_,dGk{x.y) ^ dGki^.y) ^^^dGkjx^y) ^^^ ^ dGkix^y)^ 
oy ay ox oy 

whose general solution is given by 

p ( yx-l-x '\ 

Gk{x,y)= ^ ^ \ (17) 



where F{z) is an arbitrary function. By definition, we have ^^=q gk{n, m) = 
fk{2n,0) and 

Gfc(x,l) = 5^/fc(2n,0)x^ (18) 

n>0 

Using eq. (1161) and eq. (1181) we derive 

Gk{x, y) = -—^ /'^(2n, 0) f-— -) " , (19) 

X + 1 — ^ \[x + 1 — yx)^ J 

whence the lemma. □ 
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3. Combinatorics of Iv^-shapes 

We now show how to derive the Iv^-shape of a given fc-noncrossing, a- 
canonical RNA structures. This construction is based on the notion of k- 



noncrossing cores A fc-noncrossing core is a /c-noncrossing RNA struc- 
ture in which each stack has size exactly one. The cores of a /c-noncrossing, 
cr-canonical RNA structure, 5, denoted by c(5) is obtained in two steps: first 
we map arcs and isolated vertices as follows: 

V£ > cr — 1; ((z— £, . . . , (z, j)) I— > (z, j) and j j if j is isolated (20) 

and second we relabel the vertices of the resulting diagram from left to right 
in increasing order, see FigJSl We are now in position to define Iv^-shapes. 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 3 4 6 7 9 10 12 13 14 1 2 3 4 5 6 7 8 9 10 

Figure 5: A 3-noncrossing core structure is obtained from a 3-noncrossing, 1- 
canonical RNA structure in two steps. 



Definition 1. (Iv^-shape) Given a fc-noncrossing, cr-canonical RNA struc- 
ture 5, its Iv^-shape, lv^((5), is obtained by first removing all isolated vertices 
and second apply the core-map c. 

Alternatively the Iv^-shape can also be derived as follows: we first project 
into the core c{6) , second, we remove all isolated vertices and third we apply 
the core-map c again, see Figj6l The second step is a projection from k- 




123456789 123456 



Figure 6: Two methods for generating the lv|-shape. A 3-noncrossing, 2-canonical 
RNA structure (top-left) is mapped in two ways into its Iv^-shape (top-right). 
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noncrossing cores to /c-noncrossing matchings and surjective, since for each 
fc-noncrossing matching a, we can obtain a core structure by inserting isolated 
vertices between any two arcs contained in some stack. By construction, Iv^ 
shapes do not preserve stack-lengths, interior loops and unpaired regions. 

Let Xfc(n, m) {ik{n, m)) denote the set (number) of the Iv^-shapes of length 
2n with m 1-arcs and 

n 

Ifc(^,M) = 5^5^Zfc(n,m)zV (21) 

n>0 m,=0 

be the bivariate generating function. Furthermore, let ik{n) denote the num- 
ber of the Iv^-shapes of length 2n with generating function 

I,(z) =^^fc(r^)^^ (22) 

ra>0 

Since any Iv^-shape is in particular the core of some fc-noncrossing matching. 
Lemma [2^ allows us to establish a relation between the bivariate generating 
function of Zfc(n, m) and the generating function of Ffc(z). 

Theorem 3.1. Let k, n, m be natural numbers where k >2, then the follow- 
ing assertions hold 

(a) the generating functions Ik{z,u) andlk{z) satisfy 

I.(^) = F.(^). (24) 

(b) for 2 < k < 9, the number of W^-shapes of length 2n is asymptotically 
given by 

^.(n)~c,n-«^-^)^+(^-^)/^)(/.,Ti)", (25) 

where fik is the unique minimum positive real solution of = pi and Ck is 
some positive constant. 

Proof. We first prove (a). For this purpose we define a map between 
A;-noncrossing matchings with m 1-arcs and Iv^-shapes 



g: gk{n,m) [j 



0<b<n—m 



Ik{n - 6, m) X <! {aj)i<j<n-b = b, aj > 
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where n > 1. Here, for every S G Qk{n, m), we have g{S) = {c{S), (a.j)i<j<n-b), 
where c{6) is the core structure of S obtained according to eq. (|2Ui) and where 
{cij)i<j<n-b keeps track of the deleted arcs. It is straightforward to check that 
the map g is well defined, since all the 1-arcs of c{6) are just the 1-arcs of 6. 
By construction, is a bijection and we have 

\{{aj)i<j<n-b I = b, aj > 0}| = /^j. 

j=i V / 

Then we derive 



n—m 

gk[n,m" 



6=0 

which implies 



) = ^f ^ ]ik{n-b,m), for n > 1, (26) 



Ti n n—'ni / \ 



n>0 m=0 n>l m,=0 6=0 

We next observe 



n>l m=0 6=0 ^ ^ 6>0 m>0 n>no ^ ' 

where no = max{m + 6, 1} and setting s = n — b, 

E E "f (" r) = E E E ^ 

n>l m=0 6=0 ^ ^ 6>0 m>0 s>so ^ ^ 

where Sq = max{m, 1}. In view of 



E 



s + . 1 



'l-x) 



b>0 

and interchanging the terms of summation, we derive 

n n—m 

' X 



EE E (";)''("- ''.")-v' = EE 

n>l 7Tt=0 6=0 ^ ^ s>l m=0 
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and arrive at 



n>0 m=0 n>0 m=0 ^ ^ 



According to Lemma \2.2\ we have 



V V gk{n, m)x^y'^ = = V /fc(2n, 0) ( ] 

setting z = and u = y, 

^-^ ^-^ 1 + Zz — zu ^-^ \{l + 2z — zuYI 



n>0 m=0 n>0 

In particular, setting m = 1, we derive 



n>0 n>0 

whence (a) follows. 

Assertion (b) is a direct consequence of the supercritical paradigm, see Propo- 
sition [H As mentioned before, the ordinary generating function Fk{z) = 
^^>Q /fc(2n, 0)2;"" is D-finite |2J] and the inner function -d^z) = is alge- 
braic, satisfies ^9(0) = and is analytic for \z\ < 1. By direct calculation, 
using the fact that all singularities of Fk{z) are contained within the set of 
zeros of qo,k{z), see Tab. [H we can then verify that Ffc('(9(z)) has the unique 
dominant real singularity /i^ < 1 satisfying ^{^k) = pI 2 < k < 9. In 
view of fk{2n, 0) ~ Ckn~^^''~^^^~^^''~^^^^^ {2{k - 1))^", Proposition [1] guarantees 
eq. m 

This proves (b) completing the proof of the theorem. □ 
We next studying the number of Iv^-shapes induced by fc-noncrossing, 
cr-canonical RNA structures of fixed length n, lv^^(r;,), setting 

Lvt(^) = El^tH^" (27) 

n>0 
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Theorem 3.2. Let /c, a G N, where k > 2. Then the following assertions 
hold 

(a) the generating function Lv^^(x) is given by 



(b) /or 2 < A; < 9 and 1 < a < 10 

lvL(n)~c.,.n-«^-)^^-(^-)/^)(4-;^)", (29) 
where Ck^a > and (^fc.cr is the unique minimum positive real solution of 











= Pk- 




(30) 








{1 + 2x^'' - x^'^+^f 




a/k 


2 


3 


4 5 


6 


7 


8 


1 
2 
3 


1.51243 
1.26585 
1.17928 


3.67528 
1.93496 
1.55752 


5.77291 7.82581 
2.41152 2.80275 
1.80082 1.98945 


9.85873 
3.14338 
2.14693 


11.88118 

3.44943 

2.28376 


13.89746 
3.72983 
2.40567 



Table 2: The exponential growth rates ^ of lv|-shapes induced by A:-noncrossing, 
(T-canonical RNA structures of length n. 



Proof. In order to proof of (a) we observe that we can always inflate a 
structure by adding arcs to stacks or isolated vertices without changing its 
Iv^-shape. In fact, for any given Iv^-shape, /3, adding the minimal number 
of arcs to each stack such that every stack has a arcs, and inserting one 
isolated vertex in any 1-arc, we derive a fc-noncrossing, cr-canonical structure 
having arc-length> 2, of minimal length. We can therefore derive Lv^^(x), 
see eg. (1271) ■ from the bivariate generating function Ik{z,u) as follows 

L^^J mm{s,n-2as} s 

Lvfc,^(x) = ^^ ik{s,m)x'' = YY1 Yl ^k{s,m)x'', 

n>0 s=0 m=0 s>0 m=0 n>2as+m 
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whence 

Lv^ (x) = E E ^^(^' ^)^"^^^'" 

s>0 m=0 

and in view of eq. ([23]), Ik{z,u) = ^M^Fk ( (1+2^-?^)^ ) > derive 



'^'"^ ^ (1 - x)(l + 2x2- - " V(l + 2x2- - x2-+i^ 

As for (b), we observe that the factor 

(1+X2-) 

V?^(x) - 



;i -X)(l + 2x2- -X2-+1) 

does not induce a dominant singularity of Lv^^(x). Therefore all dominant 

singularities of Lv^^(x) stem from Ffc ( 2a~^^2a+i\i 1 ■ Indeed, assume a 

contrario that there were some dominant singularity of Lv^^(x), C, that is 
induced by (pa{x). This would imply that ( is also a dominant singularity of 

Ffc (^ (^i_^2x^^-x^'^+^f ^ which immediately leads to a contradiction. 
We next verify that for 2 < k < 9 and 1 < cr < 10, the minimum positive 
real solution of eq. (ISUI) . Cfe.o-, is the unique dominant singularity of Lv|^(x) 
and Proposition [T] implies 

where Ck,cr is some positive constant and the proof of the theorem is complete. 

□ 



4. Combinatorics of Iv^^^-shapes 

Definition 2. (Iv^-shape) Given a /c-noncrossing, cr-canonical RNA struc- 
ture, 5, its Iv^-shape, lv^(5), is derived as follows: first we apply the core 
map, second we replace a segment of isolated vertices by a single isolated 
vertex and third relabel the vertices of the resulting diagram, see FigJTl 

More formally, a Iv^-shape is obtained as follows: if we have a maximal 
sequence of isolated vertices + 1, . . . ,i + i') (i.e. i — l,i + i' + 1 are 
not isolated), then we map + 1, . . . ,i + i') i and if is a arc, it is 
mapped identically. 
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1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 123456789 




1 23456789 10 11 



Figure 7: Iv^-shapes via the core map and subsequent identification of unpaired 
nucleotides: A 3-noncrossing, 1-canonical RNA structure (top-left) is mapped into 
its Iv3-shape (top-right). 

Let Ck{n,h) {Ck{n,h)) denote the set (number) of fc-noncrossing core- 
structures of length n with exactly /i-arcs. Let Jk{n,h) {iki^.h)) denote 
the set (number) of Iv^-shapes of length n with /i-arcs, and let jk{n) be the 
number of all Iv^-shapes of length n and set 

4/1+1 

ik{z,u) = XI XI jk{n,h)z'^u^ and Jfc(2;) = ^jk{n)z''. (31) 

h>0 n=2h n>0 

Theorem 4.1. For k,n, h ^ N, k >2, the following assertions hold 
(a) the generating functions J k{z,u) and 3k{z) are given by 

, , . {l + z){l + uz^) ( {l + zni + uz^)uz^ \ 



UZ^ + 2mz2 + 1 y („^3 + 2UZ^ + 1)2 

^'^'^ = .3 + 2.2 + 1 ^>'[ (,3 + 2.2 + 1)2 )■ (33) 
(b) /or 2 < < 9, the number of \\/\-shapes of length n satisfies 

,,(n)~4n-«'=-)^+('=-)/^)(^l-)", (34) 
where dy. > and fi'j. is the unique minimum positive real solution of 



(l + .)2(l + .2)^2 
(.3 + 2.2 + 1)2 



pI (35) 
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Proof. For (a) we consider the map between /c-noncrossing cores having 
exactly h arcs and Iv^-shapes, for < /i < [^^J , 

£: Ck{n, h) ^ 

n-2h-b 



U 

hQ<h<n-2h-l 



Jk(n -b,h) X <^ (ej)i<j<n-2h-b \ ^ ^ b, ej > 



where Bq = max{0, n — Ah — 1}. For every P G Ck{n, h), {ej)i<j<n-2h-b keeps 
track of the muhiphcities of the deleted isolated vertices. The map £ is a 
(well defined) bijection and 



I f n — 2h — 1\ 

\{{ej)i<j<n-2h-b\ 2^ ej = b,ej>0}\= \^ ^ j. 



We arrive at 

n-2h-l 



C,{n,h)=~f^ ^)un-b,h), Q<h<l^\. 



b=bo 

We compute 

LfJ LfJ 

n>0 h=0 n>0 /j>|_lizJ.J 



(1) 

I n— i I 

L^-J n-2h-l 



EE E' (""'^"'W-^/^k--, 

n>0 ^1=0 b=bo ^ ^ 



(11) 



and rewrite (II) as 



EE E ("-'"" - 6, /.Kx^ 

n>0 /i=0 6=6o ^ ^ 

EE E Mn-kh)t"' '1;-') 



fe>0 6>0 n=2h+b+l 
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We derive, setting s = n — b 

4/1+1 



h>0 b>0 s=2h+l ^ ^ 

4/1+1 / . 

E E ^^■(».'") E( 

/i>0 s=2/i+l \fe>0 ^ 



h 

L \6>U ^ 

4/1+1 ^ s s 

=E E (d-^-W- 

/i>0 s=2h+l ^ ^ 

In view of jki'^h, h) = Ck(2h, h), we can interpret (I) as follows 

LfJ / \ 2/1 

which allows for extending the parameter range of h 

LfJ 4/1+1 ^ s s 

n>0 /i=0 /i>0 s=2/i ^ ^ 

Setting M = (1 — and z = -j^, we obtain the bivariate generating 

function 

4/1+1 Lf J / \ " 

E E ^■'^(^' ^)^^«' = E E ^'^(^' ^) ("(1 + ^)')' ( TTi • 

/i>0 s=2h n>0 h=0 ^ ^ 



We next consider two power series relations due to [10| and [U 

EETm(^,%V = — -tFJ ^ (36) 



n>0 h=0 

EEtm(-,%v = EE^^(-'^)(t^)^" 



(37) 



n>0 /i=0 n>0 h=0 

In view of eq. (136!) and eq. (1371) . we can conclude 

4/1+1 



/i>0 s=2/i 
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and in particular, setting u = 1 



whence assertion (a). 

Assertion (b) follows in complete analogy to the proof of Theorem 13.21 First 

we verify that the factor ^^t^2z^i ^ ^'^^^ ^'^^ introduce a dominant singularity 
of Jkiz). Then we verify, using Tab.[T], that the unique dominant singularity 

°f ) is the minimum positive real solution of ^^^j}^2l^^i)^ = 

pI for 2 < k < 9. Now (b) follows from Proposition [H □ 
We finally compute the number of Iv^-shapes induced by fc-noncrossing, 
(T-canonical RNA structures of fixed length n, Iv^^(n), setting 

Lvi.(^) = El<<^H^"- (38) 

Theorem 4.2. Let k,a E N, where k > 2. Then the following assertions 
hold 

(a) the generating function Lv^^(x) is given by 

T 1 . , (l+a:)(l+a:^-) f {_l + xr^^0^^y 

''''^ ' (l-x)(a:2-+i+2x2'^ + l)'^' V(x2-+i + 2x2- + l)V ' ^ ^ 

(b) for 2 < k < 9 and 1 < a < 10, we have 

ivtH-4,.n-«^-^)^+('=-^)/^)(x.i)", (40) 

where c'^ ^ > and Xk,a is the unique minimum positive real solution of 

(l+x)2a;2-(l + x2-) ^ 

(X2-+1 + 2x2- + 1)2 ^fc- ^ ' 

Proof. Obviously, we can inflate any structure by adding arcs into its 
stacks or duplicating isolated vertices without changing its Iv^-shape. As a 
result, we can derive from any Iv^-shape by inflating its stacks to a arcs, 
a unique, minimal, fc-noncrossing, a-canonical structure inducing it. This 
observation implies 

L^J min{4/i+l,n-2((T-l)/i} 

\^k,M = Yl jk{s,h), 

h=0 s=2h 
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a/A; 2 3 4 5 6 7 8 



1 2.09188 4.51263 6.65586 8.73227 10.7804 12.8137 14.8381 

2 1.56947 2.31767 2.81092 3.21184 3.55939 3.87079 4.15552 

3 1.38475 1.80408 2.05600 2.24968 2.41081 2.55050 2.67477 



Table 3: The exponential growth rates Xk a °^ Iv^-shapes induced by fc-noncrossing, 
(T-canonical RNA structures of length n. 



whence we can rewrite the generating function 

4/1+1 4h+l 
h>0 s=2h n>2h{(7-l)+s h>0 s=2h 

Employing eq. fl5^ . we derive 

(1+X)(1 + X2'^) ^ /(1 + X)V'^(1 + X2'^^^" 



;i - + 2x2- + 1) V + 2x2- + 1 



and assertion (a) follows. As for assertion (b), we proceed in analogy to the 
proof of Theorem 13.21 and verify that for 2 < A; < 9 and 1 < ci < 10, the 
unique minimum positive real solution, Xk,a, of eq. (14T!) is the unique domi- 
nant singularity of generating function Lv^^(x). Consequently, Proposition 
[1] implies that 

where c'^, ^ is some positive constant, whence (b) and the theorem is proved. 

□ 

5. Conclusion 

Iv^,- and lv|-shapes of /c-noncrossing, a-canonical RNA pseudoknot struc- 
tures provide a significant simplification of complicated molecular configu- 
rations with cross-serial interactions. The asymptotic formulas presented in 
Theorem 13.21 and Theorem 14.21 
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imply all asymptotic results on abstract shapes of secondary structures in 
the literature (note 72"((''"^)^+('''"^)/^) = n"^/^). 

The growth rates of Iv^- and Iv^-shapes of A;-noncrossing, a-canonical 
structures, are displayed in Tab. H] and Tab. [5], where they are contrasted 
with the exponential growth rates of /c-noncrossing, a-canonical structures, 

A; 2345678 
1.96798 2.58808 3.03825 3.41383 3.74381 4.04195 4.31617 
Xkl 1.56947 2.31767 2.81092 3.21184 3.55939 3.87079 4.15552 
1.26585 1.93496 2.41152 2.80275 3.14338 3.44943 3.72983 



Table 4: The exponential growth rates of arbitrary /c-noncrossing, 2-canonical RNA 
structures of length n and the numbers of their induced Iv^ and lv| shapes. 



k 


2 


3 


4 


5 


6 


7 


8 


Cfc,3 


1.71599 
1.38475 
1.17928 


2.04771 
1.80408 
1.55752 


2.27036 
2.05600 
1.80082 


2.44664 
2.24968 
1.98945 


2.59554 
2.41081 
2.14693 


2.72590 
2.55050 
2.28376 


2.84267 
2.67477 
2.40567 



Table 5: The exponential growth rates of arbitrary A;-noncrossing, 3-canonical RNA 
structures of length n and the numbers of their induced Iv^ and lv| shapes. 



Table [5] shows that the exponential growth rate of Iv3-shapes of fc-noncrossing 
3-canonical structures are significantly smaller than that of all /c-noncrossing 
3-canonical structures. Therefore, the abstract Iv3-shapes represent a mean- 

we 



ingful reduction. At |http : //www . combinatorics . cn/ cbpc/paper . html 



provide supplemental material for our results. 
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